Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

video: Only prefer Wayland if fifo-v1 and commit-timing-v1 are available #9383

Merged
merged 2 commits into from
Aug 3, 2024

Conversation

misyltoad
Copy link
Contributor

Description

Wayland has a myriad of unresolved problems regarding surface suspension blocking forever in QueuePresent/SwapBuffers when occludedand the FIFO (vsync) implementation being fundamentally broken leading to reduced GPU-bound performance and 'barcoding' frametimes due to swapchain starvation.

There are two protocols used to solve these two problems together -- fifo-v1 and commit-timing-v1, which implement the commit queue on the compositor side, and a timestamp that frames are intended to be displayed for/discarded respectfully.

To avoid severe performance regressions for developers targeting SDL3, only pick Wayland as the default backend when these two protocols are supported -- otherwise fallback to X11/XWayland.

We do this by having two VideoBootStraps, one which is tests the preferred case, "wayland_preferred" (ie. if fifo-v1 + commit-timing-v1 are available init time), and the fallback, which is just "wayland", the same name as before, which does no such tests. Thus, forcing with SDL_VIDEO_DRIVER=wayland will go onto the fallback option, and pick Wayland always, as usual, so there is no behaviour change.

In the case that X11/XWayland is not available (ie. no DISPLAY), we will still fallback to using Wayland without these protocols available.

Existing Issue(s)

Previous MR: #9345

Sam's comment saying to do exactly this: #9345 (comment)

@misyltoad misyltoad force-pushed the prefer-wayland-with-fifo branch 2 times, most recently from 0e957ff to 1815484 Compare March 27, 2024 18:44
Wayland has a myriad of unresolved problems regarding surface suspension
blocking forever in QueuePresent/SwapBuffers when occludedand the FIFO
(vsync) implementation being fundamentally broken leading to reduced
GPU-bound performance and 'barcoding' frametimes due to swapchain
starvation.

There are two protocols used to solve these two problems together --
fifo-v1 and commit-timing-v1, which implement the commit queue on the
compositor side, and a timestamp that frames are intended to be
displayed for/discarded respectfully.

To avoid severe performance regressions for developers targeting SDL3,
only pick Wayland as the default backend when these two protocols are
supported -- otherwise fallback to X11/XWayland.

We do this by having two VideoBootStraps, one which is tests the
preferred case, "wayland_preferred" (ie. if fifo-v1 + commit-timing-v1
are available init time), and the fallback, which is just "wayland",
the same name as before, which does no such tests.
Thus, forcing with SDL_VIDEO_DRIVER=wayland will go onto the fallback
option, and pick Wayland always, as usual, so there is no behaviour
change.

In the case that X11/XWayland is not available (ie. no DISPLAY), we will
still fallback to using Wayland without these protocols available.

Signed-off-by: Joshua Ashton <[email protected]>
@misyltoad misyltoad force-pushed the prefer-wayland-with-fifo branch from 1815484 to 98a8f8e Compare March 27, 2024 18:46
@slouken
Copy link
Collaborator

slouken commented Mar 27, 2024

Thanks!

We'll leave this open for now so people can continue to test and report Wayland issues and give the proposed protocols a chance to be approved.

@slouken slouken added this to the 3.2.0 milestone Mar 27, 2024
@Conan-Kudo
Copy link
Contributor

FYI: @Zamundaaa @davidedmundson @zzag @jadahl @jonas2515 @AlanGriffiths @wmww

@Conan-Kudo
Copy link
Contributor

Also FYI: @Drakulix

@Kontrabant
Copy link
Contributor

I'm OK with this as "Wayland is the default, if your distro is new enough" since it still signals to developers that Wayland is ultimately the preferred default and shouldn't be ignored in their Vulkan/GL loader code, and avoids any potential drama that would inevitably accompany changing the default after the final release.

Incidentally, I like this pattern, and I'm thinking this could be applied to a solution that was suggested for "Pipewire by default, but only if set to mix audio", as well.

@Conan-Kudo
Copy link
Contributor

Incidentally, I like this pattern, and I'm thinking this could be applied to a solution that was suggested for "Pipewire by default, but only if set to mix audio", as well.

These are not equivalent scenarios. On systems with PipeWire audio servers, it doesn't matter which API you use, everything will be going to PipeWire anyway. Using the PipeWire API means a richer interface which can make things easier for other parts of the user's stack.

Copy link
Collaborator

@flibitijibibo flibitijibibo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good solution, though I wonder if, between this and PipeWire, we should make it easier to have the pattern of "this thing is better if x/y/z, otherwise use a different thing". PW is a bit different in that our solution for that is looking to be "if we connect Pulse and it turns out it's PipeWire, bail so we can go straight to PW," if it's possible to do this for X (I think Flatpak's sandbox does this somehow?) that would simplify this further.

In any case, definitely better than a full revert ✔️

@Kontrabant
Copy link
Contributor

Since we are setting a minimum baseline with these protocols, we could probably get rid of our own internal EGL presentation workaround hack needed for when the window is occluded and the frame callback stops as well.

@Conan-Kudo
Copy link
Contributor

Since we are setting a minimum baseline with these protocols, we could probably get rid of our own internal EGL presentation workaround hack needed for when the window is occluded and the frame callback stops as well.

Is this actually fixed somewhere or by something?

@Kontrabant
Copy link
Contributor

Kontrabant commented Mar 27, 2024

Since we are setting a minimum baseline with these protocols, we could probably get rid of our own internal EGL presentation workaround hack needed for when the window is occluded and the frame callback stops as well.

Is this actually fixed somewhere or by something?

Currently, if you use SDL's internal GL presentation function, SDL always sets the EGL swap interval to 0 and busy-waits on the frame callback with a timeout when presenting in order to avoid the issue where presentation can block forever if the frame callback stops, which is messy and undesirable as it doesn't allow swap intervals greater than 1 and can have timing issues. These protocols fix the blocking-forever issue, and if they are basically required to use Wayland without forcing it on, we can probably just get rid of our own internal mess in the presentation path.

This workaround is only for the internal Wayland_GLES_SwapWindow function, so it doesn't help if using Vulkan, or if apps/games directly call the presentation functions themselves. Still, I think everyone who has had the displeasure of dealing with the workaround would love to see it gone in favor of things working properly :)

Copy link
Contributor

@Kontrabant Kontrabant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general problem that I see with this approach is that, if these protocols are present, the video driver name will be set to "wayland_preferred", which can break any code that does something like

if (SDL_strcmp(SDL_GetCurrentVideoDriver(), "wayland") == 0) {
  // Do Wayland stuff
}

This breaks testnative with

ERROR: Couldn't find native window code for wayland_preferred driver

and probably other applications, as it's a common pattern.

The name of the driver should be reported as "wayland" regardless of whether the preferred or fallback bootstrap struct was used.

I'm thinking that it would be easiest to modify the code in SDL_video.c that loops over the video drivers to check all the list items for a given name instead of bailing as soon as the first one is found, as then the list can have multiple entries with the same name for cases like this.

Line 510 in SDL_video.c

break;

becomes

if (video) {
    break;
}

@@ -398,6 +446,18 @@ static SDL_VideoDevice *Wayland_CreateDevice(void)
}
}

/*
* If we are checking for preferred Wayland, then let's query for
* fifo-v1 and commit-timing-v1's existance so we don't regress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Suggested change
* fifo-v1 and commit-timing-v1's existance so we don't regress
* fifo-v1 and commit-timing-v1's existence so we don't regress

@flibitijibibo
Copy link
Collaborator

Been thinking about this for FNA3D/GPU and one possible solution to the "preferred" issue is adding a field to the bootstrap system like SDL_bool strict;, where most are not strict but occasionally you might have a case where a driver may be preferred over another, but with more strict requirements.

You could maybe add the bool to both the bootstrap struct as well was the Create function, so Wayland's create would still be one function and the bootstraps for Wayland and WaylandStrict would be identical minus the bool value. It would then try WaylandStrict, X11, then Wayland in that order. (This would be more helpful with GPU where Vulkan support might be preferred over D3D12, but only if it has a bunch of shiny new extensions that make it perform better)

@cgutman
Copy link
Collaborator

cgutman commented Mar 30, 2024

Since we are setting a minimum baseline with these protocols, we could probably get rid of our own internal EGL presentation workaround hack needed for when the window is occluded and the frame callback stops as well.

I'd prefer not to do this because it will be regressing apps that already opt-in to the Wayland backend today. Users may also override SDL_VIDEODRIVER for their own reasons and we don't want to make their experience worse.

@slouken
Copy link
Collaborator

slouken commented Mar 30, 2024

Since we are setting a minimum baseline with these protocols, we could probably get rid of our own internal EGL presentation workaround hack needed for when the window is occluded and the frame callback stops as well.

I'd prefer not to do this because it will be regressing apps that already opt-in to the Wayland backend today. Users may also override SDL_VIDEODRIVER for their own reasons and we don't want to make their experience worse.

Agreed.

@Kontrabant
Copy link
Contributor

Kontrabant commented Mar 30, 2024

Potential solution to the problem where one video driver now has multiple names:
Kontrabant@4083c58

Allows multiple entries to have the same name by trying all of the entries that match the string before reporting failure, and deduplicates the driver list available to client applications.

@slouken
Copy link
Collaborator

slouken commented Mar 30, 2024

Potential solution to the problem where one video driver now has multiple names: Kontrabant@4083c58

Allows multiple entries to have the same name by trying all of the entries that match the string before reporting failure, and deduplicates the driver list available to client applications.

This seems like a good solution to me.

Allow multiple bootstrap entries for a single video driver with the same name, which internally allows preferential and fallback init conditions while hiding the implementation details from applications (e.g. applications will just see "wayland", regardless of whether it's using the preferred or fallback driver list entry).

If a driver is requested, all instances of it in the list will be tried before reporting failure, and client applications programmatically enumerating the video drivers will be presented with a deduplicated list of entries.
Copy link
Collaborator

@flibitijibibo flibitijibibo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idea is still good but I think we now have a better way to handled the "preferred" driver model: The bootstrap logic should be redone to look more like #9473's, which will avoid the string duplication issue.

@Kontrabant
Copy link
Contributor

I have the needed changes in a separate branch, I'll just push the changes onto this pull.

@misyltoad
Copy link
Contributor Author

Go for it

@Kontrabant
Copy link
Contributor

Pushed the commit to mirror the Pipewire-by-default changes.

@slouken
Copy link
Collaborator

slouken commented Apr 15, 2024

Great, let's hold on this for now and we'll discuss when in the SDL3 release timeline we want to merge it.

@slouken slouken modified the milestones: 3.2.0, 3.0 ABI May 22, 2024
@slouken
Copy link
Collaborator

slouken commented Jul 8, 2024

We're getting closer to the ABI milestone, do we know at this point what Wayland implementations are going to support these extensions?

@misyltoad
Copy link
Contributor Author

misyltoad commented Jul 8, 2024

So far, 0, as neither has been merged anywhere yet and the bikeshed is currently purple.

(However, it isn't really a problem on Gamescope cause we have a Vulkan layer that does the same thing (obviously doesn't fix it for GL unless you use Zink))

@slouken
Copy link
Collaborator

slouken commented Jul 8, 2024

What is the path forward for existing compositors?

@Conan-Kudo
Copy link
Contributor

Conan-Kudo commented Jul 8, 2024

Well, it's possible for existing compositors to ship draft versions now. That's what KWin is doing for xdg-color-management, for example. That also has the side-effect of significantly speeding up the finalization of the protocol, which is why I recommend it.

So getting KWin (@davidedmundson / @Zamundaaa), Mutter (@jadahl), wlroots (@emersion), Mir (@AlanGriffiths / @RAOF), Smithay/COSMIC (@Drakulix), and/or Weston (@fooishbar) to implement them will help. Any three of these compositor frameworks implementing the necessary protocols successfully will result in the protocol being finalized and landed.

@AlanGriffiths
Copy link

For Mir, these protocols are not yet included in our planned deliverables. That is unlikely to change before November, but "PRs welcome"

@slouken
Copy link
Collaborator

slouken commented Aug 3, 2024

Just to be clear, this PR changes the default to XWayland so SDL applications get the highest performance out of the box. Users and application developers can still specify that they prefer Wayland by setting SDL_HINT_VIDEO_DRIVER to "wayland,x11".

As compositors bring extensions online to address this issue, we will support them in SDL.

@slouken slouken merged commit ca2c9f6 into libsdl-org:main Aug 3, 2024
37 checks passed
@misyltoad
Copy link
Contributor Author

I made an MR to add support for a frog-fifo-v1 implementation to Mesa here, given upstream still has not been merged.

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31329

There is also a corresponding MR to KWin (KDE Plasma) available here: https://invent.kde.org/plasma/kwin/-/merge_requests/6474

Hopefully, these can be merged and we can add a check for these and start flipping that default if these are present in SDL.

@Kontrabant
Copy link
Contributor

I'm guessing they hashed things out at XDC, because the upstream FIFO and commit timing protocols were just merged, so it's just a matter of compositor and driver support now.

@Conan-Kudo
Copy link
Contributor

Yes, we got it hashed out and merged it at the beginning of our Wayland workshop. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants